Implementing Efficient MPI on LAPI for IBM RS/6000 SP Systems: Experiences and Performance Evaluation
نویسندگان
چکیده
The IBM RS/6000 SP system is one of the most costeffective commercially available high performance machines. IBM RS/6000 SP systems support the Message Passing Interface standard (MPI) and LAPI. LAPI is a low level, reliable and efficient one sided communication API library, implemented on IBM RS/6000 SP systems. This paper explains how the high performance of the LAPI library has been exploited in order to implement the MPI standard more efficiently than the existing MPI. It describes how to avoid unnecessary data copies at both the sending and receiving sides for such an implementation. The resolution of problems arising from the mismatches between the requirements of the MPI standard and the features of LAPI is discussed. As a result of this exercise, certain enhancements to LAPI are identified to enable an efficient implementation of MPI on LAPI. The performance of the new implementation of MPI is compared with that of the underlying LAPI itself. The latency (in polling and interrupt modes) and bandwidth of our new implementation is compared with that of the native MPI implementation on RS/6000 SP systems. The results indicate that the MPI implementation on LAPI performs comparably or better than the original MPI implementation in most cases. Improvements of up to in polling mode latency, in interrupt mode latency, and in bandwidth are obtained for certain message sizes. The implementation of MPI on top of LAPI also outperforms the native MPI implementation for the NAS Parallel Benchmarks. It should be noted that the implementation of MPI on top of LAPI is not a part of any IBM product and no assumptions should be made regarding its availability as a product.
منابع مشابه
Performance and Experience with LAPI - a New High-Performance Communication Library for the IBM RS/6000 SP
LAPI is a low-level, high-performance communication interface available on the IBM RS/6000 SP system. It provides an activemessage-like interface along with remote memory copy and synchronization functionality. It is designed primarily for use by experienced programmers in developing parallel subsystems, libraries and tools, but we also expect power programmers to use it in end-user application...
متن کاملPerformance Evaluation and Modeling of Reduction Operations on the IBM RS/6000 SP Parallel Computer
We discuss algorithms for global reduction (or combine) operations (e.g., global sums) for numbers of processors that need not be a power of 2, and implement these using standard message-passing techniques on distributed-memory parallel computers. We present performance results measured on an IBM RS/6000 SP parallel computer at UNIC. Signiicant performance improvements are obtained by using a r...
متن کاملBenchmark Evaluation of the Message-Passing Overhead on Modern Parallel Architectures
The paper presented was inspired by an interesting investigation about the performance of MPI on an IBM RS/6000 SP machine [1]. The authors proposed a model for the evaluation of message-passing overhead and suggested to have an evaluation of message-passing performance on as many hardware platforms as possible. In some further investigations such evaluation was extended to other parallel platf...
متن کاملA comparison of MPI performance on
Since MPI 1] has become a standard for message-passing on distributed memory machines a number of implementations have evolved. Today there is an MPI implementation available for all relevant MPP systems, a number of which is based on MPICH 2]. In this paper we are going to present performance comparison for several implementations of MPI on diierent MPPs. Results for the Cray T3E, the IBM RS/6...
متن کاملNewton Two-stage Parallel Iterative Methods for Nonlinear Problems
Two-stage parallel Newton iterative methods to solve nonlinear systems of the form F (x) = 0 are introduced. These algorithms are based on the multisplitting technique and on the two-stage iterative methods. Convergence properties of these methods are studied when the Jacobian matrix F ′(x) is either monotone or an H-matrix. Furthermore, in order to illustrate the performance of the algorithms ...
متن کامل